Search Results for "llm inference"
Together AI Leverages AI Agents for Complex Engineering Automation
Together AI utilizes AI agents to automate intricate engineering tasks, optimizing LLM inference systems and reducing manual intervention, according to Together AI.
Together AI Achieves 40% Faster LLM Inference With Cache-Aware Architecture
Together AI's new CPD system separates warm and cold inference workloads, delivering 35-40% higher throughput for long-context AI applications on NVIDIA B200 GPUs.
NVIDIA Run:ai GPU Fractioning Delivers 77% Throughput at Half Allocation
NVIDIA and Nebius benchmarks show GPU fractioning achieves 86% user capacity on 0.5 GPU allocation, enabling 3x more concurrent users for mixed AI workloads.
NVIDIA Advances AI Infrastructure With Disaggregated LLM Inference on Kubernetes
NVIDIA details new Kubernetes deployment patterns for disaggregated LLM inference using Dynamo and Grove, promising better GPU utilization for AI workloads.
Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale
Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments.
Alibaba Unveils Its First Home-Grown AI Chip
Chinese e-commerce giant Alibaba unveiled its first artificial intelligence inference chip on Wednesday, a move which could further invigorate its already rip-roaring cloud computing business.
Bitcoin Provides a Check Against Economic Mismanagement, says US Politician
US politician Ro Khanna has shared a tweet that is bullish on Bitcoin. He also advocated for sustainable cryptocurrency mining operations.